https://github.com/jamesha95/IHAD-mapping.git

Background

In 2019, the ABS released the experimental Index of Household Advantage and Disadvantage. This variable assigns a socio-economic score to each eligible household in the 2016 Census, and then allocates the households to a quartile (not population-weighted).

This variable is not yet available in TableBuilder for cross-classification (as of March 25th, 2019). This is unfortunate; it would be very helpful to be able to investigate the relationship between socio-economic disadvantage and other characteristics.

So far, the most disaggregated information available is only the percentage of households in each quartile, at the SA1 level. In this script, we will map the proportion of households in the bottom quartile for each SA1, and then aggregate to SA2 (weighting by household, and then by population to see if there’s much difference). We may then further compare to SEIFA indices.

IHAD may end up being nothing more than a slightly more nuanced version of SEIFA, where statistical areas are described not by one score but by four (that is, the proportion of households in each IHAD quartile). But hopefully it becomes available in TableBuilder in the near future to allow easy analysis of socio-economic disadvantage without the ecological validity issues of SEIFA.

To begin, we import the data from the ABS website. We’ve named the file sa1-ihad-percentages.xls and stored it in the folder named data.

# Packages first

library(devtools)
#install_github("wfmackey/absmapsdata")

library(tidyverse)
library(sf)
library(absmapsdata)
library(grattantheme)
library(sp)
library(data.table)
library(ASGS)
library(tmap)
library(leaflet)
library(data.table)
library(magrittr)

# Now we read in the data from the website

download.file(url = "https://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&statistical%20area%20level%201,%20percentage%20of%20households,%20ihad%202016.xls&4198.0&Data%20Cubes&DE8CAFA6829E80E6CA2583AD00106126&0&2016&26.02.2019&Latest",destfile = "ihad-percentages-sa1.xls")

data <- readxl::read_xls(path = "ihad-percentages-sa1.xls", 
                         sheet = "Table 1", 
                         skip = 5, 
                         col_names = TRUE)

# Now we tidy the column names 

data <- data %>%
  
# We note that column one is the 7-digit SA1 code, which is unhelpful and will be removed
  
  select(-...1) %>%
  
# And now we give sensible names to the remaining columns
  
  rename(sa1_main_2016 = ...2,
         percentage_q1 = `Quartile 1`,
         percentage_q2 = `Quartile 2`,
         percentage_q3 = `Quartile 3`,
         percentage_q4 = `Quartile 4`,
         dwellings_in_scope = ...7,
         SEIFA_quartile = ...8,
         population = ...9, 
         dwellings_total = ...10)

Combining with map data

We now append this data to the simple features ABS map data synthesised by Will Mackey.

We’ve wrangled, now let us plot! (Note that SA1s are actually very small, making this map somewhat unhelpful. Skipping over it…)

map_sa1_melb <- 
  ggplot(data = ihad_sa1_melb) +
  geom_sf(aes(geometry = geometry,
              fill = percentage_q1),
          lwd = 0) +
  scale_fill_gradientn(colours = grattan_pal(6)) +
  theme_grattan() +
  theme(legend.position = "right",
        legend.direction = "vertical") +
  coord_sf(datum = NA) +
  labs(title = "Disadvantage across Greater Melbourne", 
       subtitle = "Percentage of households in the bottom IHAD quartile",
       fill = "Percentage")

map_sa1_syd <- 
  ggplot(data = ihad_sa1_syd) +
  geom_sf(aes(geometry = geometry,
              fill = percentage_q1),
          lwd = 0) +
  scale_fill_gradientn(colours = grattan_pal(6)) +
  theme_grattan() +
  theme(legend.position = "right",
        legend.direction = "vertical") +
  coord_sf(datum = NA) +
  labs(title = "Disadvantage across Greater Sydney", 
       subtitle = "Percentage of households in the bottom IHAD quartile",
       fill = "Percentage")  

map_sa1_melb
# map_sa1_syd

Aggregating the data to SA2s or SA3s

Now that we’ve mapped the SA1s, we can see that several areas are unpopulated. We will now aggregate to the SA2 level, and try to increase interactivity in the map (hover to see the SA2 name).

The proportion of households facing disadvantage (i.e. in the lowest quartile) can be determined by summing the number of disadvantaged households in each SA2 and dividing by the total number of in-scope households.

# We need to correspond the sa1 data to sa2s, then append to sa2 geometry.

corresp_sa1_to_sa2 <- tibble(sa1_main_2016 = sa12016$sa1_main_2016, 
                             sa2_main_2016 = sa12016$sa2_main_2016)

ihad_agg <- data %>% 
  full_join(corresp_sa1_to_sa2, by = "sa1_main_2016") %>% 
  mutate(n_q1 = percentage_q1*dwellings_in_scope/100,
         n_q2 = percentage_q2*dwellings_in_scope/100,
         n_q3 = percentage_q3*dwellings_in_scope/100,
         n_q4 = percentage_q4*dwellings_in_scope/100) %>% 
  group_by(sa2_main_2016) %>%
  summarise(n_q1 = sum(n_q1, na.rm = TRUE),
            n_q2 = sum(n_q2, na.rm = TRUE),
            n_q3 = sum(n_q3, na.rm = TRUE),
            n_q4 = sum(n_q4, na.rm = TRUE),
            dwellings_in_scope = sum(dwellings_in_scope, na.rm = TRUE),
            dwellings_total = sum(dwellings_total, na.rm = TRUE),
            population = sum(population, na.rm = TRUE)) %>% 
  mutate(percentage_q1 = 100*n_q1/dwellings_in_scope,
         percentage_q2 = 100*n_q2/dwellings_in_scope,
         percentage_q3 = 100*n_q3/dwellings_in_scope,
         percentage_q4 = 100*n_q4/dwellings_in_scope,
         approx_number_q1_households = round(percentage_q1*dwellings_total/100),
         approx_number_q2_households = round(percentage_q2*dwellings_total/100),
         approx_number_q3_households = round(percentage_q3*dwellings_total/100),
         approx_number_q4_households = round(percentage_q4*dwellings_total/100)) %>% 
  full_join(sa22016, by = "sa2_main_2016")
  
# One SA2 is a clear outlier - Braeside in Melbourne (100%). It has a low sample size, about 11 households. We exclude those SA2s with more than 60% of households in the bottom quartile.

ihad_sa2_melb <- ihad_agg %>% 
  filter(gcc_name_2016 == "Greater Melbourne")

ihad_sa2_melb_trim <- ihad_sa2_melb %>%
  mutate(percentage_q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 2), NA)) # removes the shading from SA2s with low household counts

ihad_sa2_syd <- ihad_agg %>% 
  filter(gcc_name_2016 == "Greater Sydney")

ihad_sa2_syd_trim <- ihad_sa2_syd %>%
  mutate(percentage_q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 2), NA)) # removes the shading from SA2s with low household counts

ihad_sa2_bris <- ihad_agg %>% 
  filter(gcc_name_2016 == "Greater Brisbane")

ihad_sa2_bris_trim <- ihad_sa2_bris %>%
  mutate(percentage_q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 2), NA)) # removes the shading from SA2s with low household counts

And aggregating to SA3s now:

# We need to correspond the sa1 data to sa2s, then append to sa2 geometry.

corresp_sa1_to_sa3 <- tibble(sa1_main_2016 = sa12016$sa1_main_2016, 
                             sa3_code_2016 = sa12016$sa3_code_2016)

ihad_agg_sa3 <- data %>% 
  full_join(corresp_sa1_to_sa3, by = "sa1_main_2016") %>% 
  mutate(n_q1 = percentage_q1*dwellings_in_scope/100,
         n_q2 = percentage_q2*dwellings_in_scope/100,
         n_q3 = percentage_q3*dwellings_in_scope/100,
         n_q4 = percentage_q4*dwellings_in_scope/100) %>% 
  group_by(sa3_code_2016) %>%
  summarise(n_q1 = sum(n_q1, na.rm = TRUE),
            n_q2 = sum(n_q2, na.rm = TRUE),
            n_q3 = sum(n_q3, na.rm = TRUE),
            n_q4 = sum(n_q4, na.rm = TRUE),
            dwellings_in_scope = sum(dwellings_in_scope, na.rm = TRUE),
            dwellings_total = sum(dwellings_total, na.rm = TRUE),
            population = sum(population, na.rm = TRUE)) %>% 
  mutate(percentage_q1 = n_q1/dwellings_in_scope*100,
         percentage_q2 = n_q2/dwellings_in_scope*100,
         percentage_q3 = n_q3/dwellings_in_scope*100,
         percentage_q4 = n_q4/dwellings_in_scope*100,
         approx_number_q1_households = round(percentage_q1*dwellings_total/100),
         approx_number_q2_households = round(percentage_q2*dwellings_total/100),
         approx_number_q3_households = round(percentage_q3*dwellings_total/100),
         approx_number_q4_households = round(percentage_q4*dwellings_total/100)
         ) %>% 
  full_join(sa32016, by = "sa3_code_2016")
  
#

ihad_sa3_melb <- ihad_agg_sa3 %>% 
  filter(gcc_name_2016 == "Greater Melbourne")

ihad_sa3_syd <- ihad_agg_sa3 %>% 
  filter(gcc_name_2016 == "Greater Sydney")

ihad_sa3_bris <- ihad_agg_sa3 %>% 
  filter(gcc_name_2016 == "Greater Brisbane")

Having wrangled, now we plot. Note that outlying SA2s have really ruined our scale.

#I've set `results="show"` because without it, the interactive map won't print

map_sa2_melb <- 
  ggplot(data = ihad_sa2_melb) +
  geom_sf(aes(geometry = geometry,
              fill = percentage_q1),
          lwd = 0) +
  geom_point(aes(x = cent_lat,
                 y = cent_long,
                 size = approx_number_q1_households),
             colour = "black",
             alpha = 0.5) +
  scale_size(range = c(0.1, 1)) +
  scale_fill_gradientn(colours = grattan_pal(6)) +
  theme_grattan() +
  theme(legend.position = "right",
        legend.direction = "vertical",
        axis.title = element_blank()) +
  coord_sf(datum = NA, crs = "+init=epsg:4238") +
  labs(title = "Disadvantage across Greater Melbourne", 
       subtitle = "Percentage of households in the bottom IHAD quartile",
       fill = "Percentage")
  
map_sa2_melb

# Re-ordering columns so that the label is the name, not the code
ihad_sa2_melb_leaflet <- ihad_sa2_melb %>% 
  select(sa2_name_2016, everything())

# getting the tibble ready for use in tmap
sf_sa2_melb <- st_as_sf(ihad_sa2_melb_leaflet)

tmap_mode("view")

int_map <- tm_shape(sf_sa2_melb) + 
  tm_fill(col = "percentage_q1", 
          palette = grattan_pal(6),
          alpha = 0.5) +
  tm_bubbles(size = "approx_number_q1_households", 
             col = "black", 
             border.col = "white", 
             alpha = 0.5) +
  tm_borders(col = grattan_grey1, lwd = 0.5)

int_map

Next we’ll try making a map with a dot plotted for every 50 or so households in the bottom quartile.

#  Let's try plotting one dot for every 50 households
#  Note that we do this only for Melbourne, as it's not our preferred technique (which is sampling from SA1s, in a later chunk)


perNhh <- 50
pointCollector <- list()

# Now let's remove pesky low-sample SA2s by deleting their percentage value for the bottom quartile. This will still 
# allow us to plot them, but their fill can be set to grey. 

ihad_sa2_melb_trim <- ihad_sa2_melb %>%
  mutate(percentage_q1 = ifelse(dwellings_total >= 100, percentage_q1, NA))

melb_sf <- SA2_2016
temp <- SA2_2016@data %>% as.data.table()
melb_sf@data <- temp[ , SA2_MAIN16 := SA2_MAIN16 %>% as.character()]
melb_sf <- melb_sf[melb_sf$GCC_NAME16 == "Greater Melbourne",]

for(i in ihad_sa2_melb_trim$sa2_main_2016){
  sa2_shape_frame <- melb_sf[melb_sf$SA2_MAIN16 == i, ]
  if(nrow(sa2_shape_frame) < 1){  next()  }
  
  npts <- floor(ihad_sa2_melb_trim$approx_number_q1_households[ihad_sa2_melb_trim$sa2_main_2016 == i]/perNhh)
  if(is.na(npts)){next()}
  if(npts == 0){next()}
  pts <- data.frame(spsample(x = sa2_shape_frame, 
                             n = npts, 
                             type = "random", 
                             iter = 10)@coords) # iter(default = 4): number of times to try to place sample points in a polygon before giving up and returning NULL - this may occur when trying to hit a small and awkwardly shaped polygon in a large bounding box with a small number of points.
  pointCollector[[i]] <- pts
}

pointFrame <- bind_rows(pointCollector, .id = "sa2_main")

And once again, we plot.

map_sa2_melb_dots <- ggplot(ihad_sa2_melb_trim)
map_sa2_melb_dots <- map_sa2_melb_dots + theme_void()
map_sa2_melb_dots <- map_sa2_melb_dots + geom_sf(aes(geometry = geometry, 
                                                     fill = percentage_q1),
                                                 lwd = 0)
map_sa2_melb_dots <- map_sa2_melb_dots + scale_fill_gradientn(colours = grattan_pal(6))
map_sa2_melb_dots <- map_sa2_melb_dots + geom_point(data = pointFrame,
                                                    aes(x = x, y = y),
                                                    shape = ".",
                                                    colour = "BLACK",
                                                    alpha = 1/2)
map_sa2_melb_dots <- map_sa2_melb_dots + theme(legend.position = "right", 
                                               legend.direction = "vertical",
                                               legend.text = element_text(size = 12),
                                               axis.title = element_blank())
map_sa2_melb_dots <- map_sa2_melb_dots + coord_sf(datum = NA, crs = "+init=epsg:4238")
map_sa2_melb_dots <- map_sa2_melb_dots + labs(title = "Disadvantage across Greater Melbourne", 
                                              subtitle = "Percentage of households in the bottom IHAD quartile",
                                              fill = "Percentage",
                                              shape = "50 households")

map_sa2_melb_dots

Now we’re talking! This is much more informative. Next, let’s try sampling the households within SA1s, but plotting onto the SA2 map. This ensures more accurate house-dot location.

#  Let's try plotting one dot for every 50 households 
# (works but very concentrated dots in inner-Melb). 

# Now let's try 100 
# (too few dots, SA1s are too small...)

# Okay let's go back to 50 but use a ceiling function
# Better, but a hist of approx_number_households shows that the average number is only 38, and the distribution is extremely skewed

# so let's try 10 with a `round` instead of a ceiling or floor. This could be too much
# and yeah, it's a lot

# next is to try 20
# still a lot

# we settle on 50 with a round feature (so SA1s with at least 25 households get a dot)

perNhh <- 50
pointCollector <- list()

# Create a shapefile for each capital
sa1_sf <- SA1_2016
temp <- SA1_2016@data %>% as.data.table()
sa1_sf@data <- temp[ , SA1_MAIN16 := SA1_MAIN16 %>% as.character()]
melb_sa1_sf <- sa1_sf[sa1_sf$GCC_NAME16 == "Greater Melbourne",]
syd_sa1_sf <- sa1_sf[sa1_sf$GCC_NAME16 == "Greater Sydney",]
bris_sa1_sf <- sa1_sf[sa1_sf$GCC_NAME16 == "Greater Brisbane",]

# We repeat the below code for each city, sampling household dots from the SA1s

# --- Melbourne ---

for(i in ihad_sa1_melb$sa1_main_2016){
  sa1_shape_frame <- melb_sa1_sf[melb_sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  if(is.na(ihad_sa1_melb$approx_number_q1_households[ihad_sa1_melb$sa1_main_2016 == i])){next()}
  if(ihad_sa1_melb$approx_number_q1_households[ihad_sa1_melb$sa1_main_2016 == i] == 0){next()}
  npts <- round(ihad_sa1_melb$approx_number_q1_households[ihad_sa1_melb$sa1_main_2016 == i]/perNhh)
  if(npts == 0){next()}
  pts <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts,
                             type = "random",
                             iter = 10)@coords) # iter(default = 4): number of times to try to place sample points in a polygon before giving up and returning NULL - this may occur when trying to hit a small and awkwardly shaped polygon in a large bounding box with a small number of points.
  pointCollector[[i]] <- pts
}

pointFrameSA1melb <- bind_rows(pointCollector, .id = "sa1_main")
pointCollector <- list() # cleaning up after oneself
dim(pointFrameSA1melb) # just some checks
approx_total_q1_households <- ihad_sa1_melb$approx_number_q1_households %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA1melb)[1]/approx_total_q1_households
percent_covered_by_dots
#head(pointFrameSA1melb)

# --- Sydney ---

for(i in ihad_sa1_syd$sa1_main_2016){
  sa1_shape_frame <- syd_sa1_sf[syd_sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  if(is.na(ihad_sa1_syd$approx_number_q1_households[ihad_sa1_syd$sa1_main_2016 == i])){next()}
  if(ihad_sa1_syd$approx_number_q1_households[ihad_sa1_syd$sa1_main_2016 == i] == 0){next()}
  npts <- round(ihad_sa1_syd$approx_number_q1_households[ihad_sa1_syd$sa1_main_2016 == i]/perNhh)
  if(npts == 0){next()}
  pts <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts,
                             type = "random",
                             iter = 10)@coords) # iter(default = 4): number of times to try to place sample points in a polygon before giving up and returning NULL - this may occur when trying to hit a small and awkwardly shaped polygon in a large bounding box with a small number of points.
  pointCollector[[i]] <- pts
}

pointFrameSA1syd <- bind_rows(pointCollector, .id = "sa1_main")
pointCollector <- list() # cleaning up after oneself
dim(pointFrameSA1syd) # just some checks
approx_total_q1_households <- ihad_sa1_syd$approx_number_q1_households %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA1syd)[1]/approx_total_q1_households
percent_covered_by_dots # this tells us whether our choice to simulate from sa1s with >24 households has led to many homes being ignored. But it's not too bad, >90% are represented. 
#head(pointFrameSA1syd)

# --- Brisbane ---

for(i in ihad_sa1_bris$sa1_main_2016){
  sa1_shape_frame <- bris_sa1_sf[bris_sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  if(is.na(ihad_sa1_bris$approx_number_q1_households[ihad_sa1_bris$sa1_main_2016 == i])){next()}
  if(ihad_sa1_bris$approx_number_q1_households[ihad_sa1_bris$sa1_main_2016 == i] == 0){next()}
  npts <- round(ihad_sa1_bris$approx_number_q1_households[ihad_sa1_bris$sa1_main_2016 == i]/perNhh)
  if(npts == 0){next()}
  pts <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts,
                             type = "random",
                             iter = 10)@coords) # iter(default = 4): number of times to try to place sample points in a polygon before giving up and returning NULL - this may occur when trying to hit a small and awkwardly shaped polygon in a large bounding box with a small number of points.
  pointCollector[[i]] <- pts
}

pointFrameSA1bris <- bind_rows(pointCollector, .id = "sa1_main")
pointCollector <- list() # cleaning up after oneself
dim(pointFrameSA1bris) # just some checks
approx_total_q1_households <- ihad_sa1_bris$approx_number_q1_households %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA1bris)[1]/approx_total_q1_households
percent_covered_by_dots
#head(pointFrameSA1bris)

The computation time was quite a bit longer for these simulations.

Now let’s plot Melbourne.

# 

map_sa2_melb_dots <- ggplot(ihad_sa2_melb_trim)
map_sa2_melb_dots <- map_sa2_melb_dots + theme_void()
map_sa2_melb_dots <- map_sa2_melb_dots + geom_sf(aes(geometry = geometry, 
                                                     fill = percentage_q1),
                                                 lwd = 0)
map_sa2_melb_dots <- map_sa2_melb_dots + scale_fill_gradientn(colours = grattan_pal(6))
map_sa2_melb_dots <- map_sa2_melb_dots + geom_point(data = pointFrameSA1melb,
                                                    aes(x = x, y = y),
                                                    shape = ".",
                                                    colour = "BLACK",
                                                    alpha = 1/2)
# I don't know why, but some sampled dots have coordinates (x,y) and some have coords (coords.x1, coords.x2). I should investigate and fix, but for now I'll just plot both.
map_sa2_melb_dots <- map_sa2_melb_dots + geom_point(data = pointFrameSA1melb,
                                                    aes(x = coords.x1, y = coords.x2),
                                                    shape = ".",
                                                    colour = "BLACK",
                                                    alpha = 1/2)
map_sa2_melb_dots <- map_sa2_melb_dots + theme(legend.position = "right", 
                                               legend.direction = "vertical",
                                               legend.text = element_text(size = 12),
                                               axis.title = element_blank())
map_sa2_melb_dots <- map_sa2_melb_dots + coord_sf(datum = NA, crs = "+init=epsg:4238")
map_sa2_melb_dots <- map_sa2_melb_dots + labs(title = "Disadvantage across Greater Melbourne", 
                                              subtitle = "Percentage of households in the bottom IHAD quartile",
                                              fill = "Percentage",
                                              shape = "50 households")

map_sa2_melb_dots 

Nice; the actual plotting was very quick using ggplot.

Now we’re getting fancy. We’re simulating dots for groups of households at the SA1 level to ensure locational accuracy, but plotting them onto SA3s so that we can see the underlying colour better.

# 

map_melb_sa3_dots_sa1 <- ggplot(ihad_sa3_melb)
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + theme_void()
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + geom_sf(aes(geometry = geometry, 
                                                     fill = percentage_q1),
                                                 lwd = 0)
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + scale_fill_gradientn(colours = grattan_pal(6))
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + geom_point(data = pointFrameSA1melb,
                                                    aes(x = x, y = y),
                                                    shape = ".",
                                                    colour = "BLACK",
                                                    alpha = 1/2)
# I don't know why, but some sampled dots have coordinates (x,y) and some have coords (coords.x1, coords.x2). I should investigate and fix, but for now I'll just plot both.
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + geom_point(data = pointFrameSA1melb,
                                                    aes(x = coords.x1, y = coords.x2),
                                                    shape = ".",
                                                    colour = "BLACK",
                                                    alpha = 1/2)
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + theme(legend.position = "right", 
                                               legend.direction = "vertical",
                                               legend.text = element_text(size = 12),
                                               axis.title = element_blank(),
                                               plot.title = element_text(hjust = 0.5), 
                                               plot.subtitle = element_text(hjust = 0.5),
                                               plot.caption = element_text(hjust = 0.5)
                                               )
                                              
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + coord_sf(datum = NA, crs = "+init=epsg:4238")
map_melb_sa3_dots_sa1 <- map_melb_sa3_dots_sa1 + labs(title = "Disadvantage across Greater Melbourne", 
                                              subtitle = "Percentage of households in the bottom IHAD quartile",
                                              fill = "Percentage",
                                              caption = "Each dot represents 50 households.\nBased on Grattan Institute analysis of ABS data.")

map_melb_sa3_dots_sa1

# grattan_save("Melbourne_Disadvantaged_Households.png", 
#              object = map_melb_sa3_dots_sa1,
#              type = "fullslide")

And lastly, I want to build an interactive version of the above map, with roads visible.

We may also want to do some spatial analysis, such as distance from nearest school or an overlap of the congestion scheme region and the number of disadvantaged households.

# Removing the SA2s with few households (such as Braeside) and getting the file ready for tmap
sf_sa2_melb_trim <- ihad_sa2_melb %>% 
  select(sa2_name_2016, everything()) %>% 
  mutate(percentage_of_bottom_quartile_households = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         approx_number_disadvantaged_households = round(approx_number_q1_households),
         note = "One dot represents 50 households in the bottom quartile.") %>% 
  st_as_sf()

# and let's add an SA1 map below too.

sf_sa1_melb <- ihad_sa1_melb %>% 
  st_as_sf()

# now we convert the household locations into longitude and latitude

points_to_plot_melb <- pointFrameSA1melb %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>% 
   st_as_sf(coords = c('x', 'y'))

# and we plot!

tmap_mode("view")

int_map_sa1_melb <- tm_shape(sf_sa2_melb_trim) + 
  tm_fill(col = "percentage_of_bottom_quartile_households", 
          palette = grattan_pal(6),
          alpha = 0.5,
          popup.vars = c("sa2_name_2016", "sa3_name_2016", "percentage_of_bottom_quartile_households", "approx_number_disadvantaged_households", "note")) +
  tm_borders(col = grattan_grey1, lwd = 0.5) +
  
  tm_shape(sf_sa1_melb) + # this includes an SA1 map that can be turned on or off. But it also makes the file much larger and a bit slower...
  tm_borders(col = "red") +
  
  tm_shape(points_to_plot_melb) +
  tm_dots(col = "black",
          size = 0.0001, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks black circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa1_melb
tmap_save(tm = int_map_sa1_melb, filename = "Interactive_SA1_map_of_disadvantage_in_Melbourne.html")
# Removing the SA2s with few households and getting the file ready for tmap
sf_sa2_syd_trim <- ihad_sa2_syd %>% 
  select(sa2_name_2016, everything()) %>% 
  mutate(percentage_of_bottom_quartile_households = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         approx_number_disadvantaged_households = round(approx_number_q1_households),
         note = "One dot represents 50 households in the bottom quartile.") %>% 
  st_as_sf()

# and let's add an SA1 map below too.

sf_sa1_syd <- ihad_sa1_syd %>% 
  st_as_sf()

# now we convert the household locations into longitude and latitude

points_to_plot_syd <- pointFrameSA1syd %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>% 
   st_as_sf(coords = c('x', 'y'))

# and we plot!

tmap_mode("view")

int_map_sa1_syd <- tm_shape(sf_sa2_syd_trim) + 
  tm_fill(col = "percentage_of_bottom_quartile_households", 
          palette = grattan_pal(6),
          alpha = 0.5,
          popup.vars = c("sa2_name_2016", "sa3_name_2016", "percentage_of_bottom_quartile_households", "approx_number_disadvantaged_households", "note")) +
  tm_borders(col = grattan_grey1, lwd = 0.5) +
  
  tm_shape(sf_sa1_syd) + # this includes an SA1 map that can be turned on or off. But it also makes the file much larger and a bit slower. 
  tm_borders(col = "red") +
  
  tm_shape(points_to_plot_syd) +
  tm_dots(col = "black",
          size = 0.0001, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks black circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa1_syd

 tmap_save(tm = int_map_sa1_syd, filename = "Interactive_SA1_map_of_disadvantage_in_Sydney.html")
# Removing the SA2s with few households (such as Braeside) and getting the file ready for tmap
sf_sa2_bris_trim <- ihad_sa2_bris %>% 
  select(sa2_name_2016, everything()) %>% 
  mutate(percentage_of_bottom_quartile_households = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         approx_number_disadvantaged_households = round(approx_number_q1_households),
         note = "One dot represents 50 households in the bottom quartile.") %>% 
  st_as_sf()

# and let's add an SA1 map below too.

sf_sa1_bris <- ihad_sa1_bris %>% 
  st_as_sf()

# now we convert the household locations into longitude and latitude

points_to_plot_bris <- pointFrameSA1bris %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>% 
   st_as_sf(coords = c('x', 'y'))

# and we plot!

tmap_mode("view")

int_map_sa1_bris <- tm_shape(sf_sa2_bris_trim) + 
  tm_fill(col = "percentage_of_bottom_quartile_households", 
          palette = grattan_pal(6),
          alpha = 0.5,
          popup.vars = c("sa2_name_2016", "sa3_name_2016", "percentage_of_bottom_quartile_households", "approx_number_disadvantaged_households", "note")) +
  tm_borders(col = grattan_grey1, lwd = 0.5) +
  
  tm_shape(sf_sa1_bris) + # this includes an SA1 map that can be turned on or off. But it also makes the file much larger and a bit slower. 
  tm_borders(col = "red") +
  
  tm_shape(points_to_plot_bris) +
  tm_dots(col = "black",
          size = 0.0001, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks black circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa1_bris

 tmap_save(tm = int_map_sa1_bris, filename = "Interactive_SA1_map_of_disadvantage_in_Brisbane.html")

Mapping advantage and disadvantage across capital cities

So we tried to simulate households at the SA1 level across Australia, and it was not productive. The output files were around 100MB and too laggy to use.

We then tried simulating at the SA3 level, with one dot per 100 households. The files were still ~30MB and laggy, and the location of households was not accurate, especially in the outback.

# 20 is wayyyyy to many, it makes the file ~100MB for all of Australia.
# Even 50 crashed it; the simulation took 20mins, and then we got an error saying that the file couldn't be opened...
# We should abandon simulation at the SA1 level, and maybe try simulating 100 households at the SA3 level if we really want a national map.

perNhh <- 100
pointCollector <- list()

# Create one national shapefile
sa3_sf <- SA3_2016
temp <- SA3_2016@data %>% as.data.table()
sa3_sf@data <- temp[ , SA3_CODE16 := SA3_CODE16 %>% as.character()]
sa3_sf <- sa3_sf[sa3_sf$AREASQKM16 > 10^-4, ] # remove those pesky SA1s with zero area (like migratory - offshore or no usual address)


# Let's try to simulate the whole country at once and see if it works. 
# okay so the whole country fails to converge even with iter = 100. Let's restrict it to just the capital cities.

ihad_agg_sa3 <- ihad_agg_sa3 %>%
  filter(areasqkm_2016 > 10^-4)

for(i in ihad_agg_sa3$sa3_code_2016){
  sa3_shape_frame <- sa3_sf[sa3_sf$SA3_CODE16 == i, ]
  if(nrow(sa3_shape_frame) < 1){  next()  }
  npts_q1 <- round(ihad_agg_sa3$approx_number_q1_households[ihad_agg_sa3$sa3_code_2016 == i]/perNhh)
  if(is.na(npts_q1)){next()} 
  if(npts_q1 == 0){next()}
  
  pts_q1 <- data.frame(spsample(x = sa3_shape_frame,
                             n = npts_q1,
                             type = "random",
                             iter = 1000)@coords) 
  pointCollector[[i]] <- pts_q1
}
pointFrameq1 <- bind_rows(pointCollector, .id = "sa3_code") %>% 
  mutate(IHAD_quartile = as.factor(1))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_agg_sa3$sa3_code_2016){
  sa3_shape_frame <- sa3_sf[sa3_sf$SA3_CODE16 == i, ]
  if(nrow(sa3_shape_frame) < 1){  next()  }
  npts_q2 <- round(ihad_agg_sa3$approx_number_q2_households[ihad_agg_sa3$sa3_code_2016 == i]/perNhh)
  if(is.na(npts_q2)){next()} 
  if(npts_q2 == 0){next()}
  pts_q2 <- data.frame(spsample(x = sa3_shape_frame,
                             n = npts_q2,
                             type = "random",
                             iter = 1000)@coords) 
  pointCollector[[i]] <- pts_q2
}
pointFrameq2 <- bind_rows(pointCollector, .id = "sa3_code")%>% 
  mutate(IHAD_quartile = as.factor(2))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_agg_sa3$sa3_code_2016){
  sa3_shape_frame <- sa3_sf[sa3_sf$SA3_CODE16 == i, ]
  if(nrow(sa3_shape_frame) < 1){  next()  }
  npts_q3 <- round(ihad_agg_sa3$approx_number_q3_households[ihad_agg_sa3$sa3_code_2016 == i]/perNhh)
  if(is.na(npts_q3)){next()} 
  if(npts_q3 == 0){next()}
  pts_q3 <- data.frame(spsample(x = sa3_shape_frame,
                             n = npts_q3,
                             type = "random",
                             iter = 1000)@coords) 
  pointCollector[[i]] <- pts_q3
}
pointFrameq3 <- bind_rows(pointCollector, .id = "sa3_code") %>% 
  mutate(IHAD_quartile = as.factor(3))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_agg_sa3$sa3_code_2016){
  sa3_shape_frame <- sa3_sf[sa3_sf$SA3_CODE16 == i, ]
  if(nrow(sa3_shape_frame) < 1){  next()  }
  npts_q4 <- round(ihad_agg_sa3$approx_number_q4_households[ihad_agg_sa3$sa3_code_2016 == i]/perNhh)
  if(is.na(npts_q4)){next()} 
  if(npts_q4 == 0){next()}
  pts_q4 <- data.frame(spsample(x = sa3_shape_frame,
                             n = npts_q4,
                             type = "random",
                             iter = 1000)@coords) 
  pointCollector[[i]] <- pts_q4
}
pointFrameq4 <- bind_rows(pointCollector, .id = "sa3_code") %>% 
  mutate(IHAD_quartile = as.factor(4))
pointCollector <- list() # cleaning up after oneself


pointFrameSA3aus <- rbind(pointFrameq1, pointFrameq2, pointFrameq3, pointFrameq4)

dim(pointFrameSA3aus) # just a check
approx_total_households <- ihad_agg_sa3$dwellings_total %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA3aus)[1]/approx_total_households
percent_covered_by_dots
# head(pointFrameSA1aus)

randomly <- function(x) sample(xtfrm(x))

pointFrameSA3aus <- pointFrameSA3aus %>% 
  arrange(randomly(sa3_code))

corresp_sa3_to_gcc <- tibble(sa3_code = sa32016$sa3_code_2016, 
                             gcc_name_2016 = sa32016$gcc_name_2016)

pointFrameSA3aus <- pointFrameSA3aus %>% 
  full_join(corresp_sa3_to_gcc, by = "sa3_code")



#```
#And now we plot a giant map, covering all of Australia.

#```{r Interactive map of all quartiles, results="show"}

# Removing the SA2s with few households (such as Braeside) and getting the file ready for tmap
sf_sa3_trim <- ihad_agg_sa3 %>% 
  select(sa3_name_2016, everything()) %>% 
  mutate(perc_Q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         perc_Q2 = ifelse(dwellings_total >= 100, round(percentage_q2, 1), NA),
         perc_Q3 = ifelse(dwellings_total >= 100, round(percentage_q3, 1), NA),
         perc_Q4 = ifelse(dwellings_total >= 100, round(percentage_q4, 1), NA),
         approx_total_households = round(dwellings_total),
         note = paste0("One dot represents ", perNhh, " households.")) %>% 
  st_as_sf()

# now we convert the household locations into longitude and latitude

points_to_plot_aus <- pointFrameSA3aus %>% 
#  mutate(x = ifelse(is.na(x), coords.x1, x),
#         y = ifelse(is.na(y), coords.x2, y)) %>% 
#  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE) %>% 
   st_as_sf(coords = c('x', 'y'))

# and we plot!

tmap_mode("view")

int_map_sa3_aus <- tm_shape(sf_sa3_trim) + 
  tm_fill(col = "white",  # this time, I don't actually want to shade anything in, I want to dot colour to do it
          alpha = 0,
          popup.vars = c("sa3_name_2016", "perc_Q1", "perc_Q2", "perc_Q3", "perc_Q4", "approx_total_households", "note")) +
  tm_borders(col = grattan_grey1, 
             lwd = 0.5) +
  
  tm_shape(points_to_plot_aus) +
  tm_dots(col = "IHAD_quartile",
          palette = grattan_pal(4, reverse = TRUE),
          border.alpha = 0,
          alpha = 0.5, 
          size = 0.0001, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks black circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa1_aus

 tmap_save(tm = int_map_sa3_aus, filename = paste0("Interactive_sa3_map_of_disadvantage_and_advantage_in_Australia_", perNhh, ".html"))

 #bulky, and simulating household location over big SA3s is geographically misleading, especially in outback Australia.

Plotting the whole country is way too ambitious (files are unmanagably large). Let’s break it down into capital cities only. Be warned though; simulating at the SA1 level takes a LONG time. This chunk in total takes at least five minutes. Go make some tea, or write the code to simulate at the SA2 level instead.

# Set GCC to speed up the process

citylist <- c("Sydney", "Melbourne", "Brisbane")
for(city in citylist){

# My thinking is that SA1 simulations are too computationally intensive and we should probably just use SA2s.   
perNhh <- 50
pointCollector <- list()

# Create one national shapefile
sa1_sf <- SA1_2016
temp <- SA1_2016@data %>% as.data.table()
sa1_sf@data <- temp[ , SA1_MAIN16 := SA1_MAIN16 %>% as.character()]
sa1_sf <- sa1_sf[sa1_sf$AREASQKM16 > 10^-4, ] # remove those pesky SA1s with zero area (like migratory - offshore or no usual address)

sa1_sf_gcc <- sa1_sf[sa1_sf$GCC_NAME16 == paste0("Greater ", city), ]

ihad_sa1_gcc <- ihad_sa1 %>%
  mutate(approx_number_q2_households = round(dwellings_total*percentage_q2/100),
         approx_number_q3_households = round(dwellings_total*percentage_q3/100),
         approx_number_q4_households = round(dwellings_total*percentage_q4/100)) %>% 
  filter(areasqkm_2016 > 10^-4,
         gcc_name_2016 == paste0("Greater ", city))

for(i in ihad_sa1_gcc$sa1_main_2016){
  sa1_shape_frame <- sa1_sf[sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  npts_q1 <- round(ihad_sa1_gcc$approx_number_q1_households[ihad_sa1_gcc$sa1_main_2016 == i]/perNhh)
  if(is.na(npts_q1)){next()} 
  if(npts_q1 == 0){next()}
  pts_q1 <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts_q1,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q1
}
pointFrameq1 <- bind_rows(pointCollector, .id = "sa1_main") %>% 
  mutate(IHAD_quartile = as.factor(1))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa1_gcc$sa1_main_2016){
  sa1_shape_frame <- sa1_sf[sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  npts_q2 <- round(ihad_sa1_gcc$approx_number_q2_households[ihad_sa1_gcc$sa1_main_2016 == i]/perNhh)
  if(is.na(npts_q2)){next()} 
  if(npts_q2 == 0){next()}
  pts_q2 <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts_q2,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q2
}
pointFrameq2 <- bind_rows(pointCollector, .id = "sa1_main")%>% 
  mutate(IHAD_quartile = as.factor(2))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa1_gcc$sa1_main_2016){
  sa1_shape_frame <- sa1_sf[sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  npts_q3 <- round(ihad_sa1_gcc$approx_number_q3_households[ihad_sa1_gcc$sa1_main_2016 == i]/perNhh)
  if(is.na(npts_q3)){next()} 
  if(npts_q3 == 0){next()}
  pts_q3 <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts_q3,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q3
}
pointFrameq3 <- bind_rows(pointCollector, .id = "sa1_main") %>% 
  mutate(IHAD_quartile = as.factor(3))
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa1_gcc$sa1_main_2016){
  sa1_shape_frame <- sa1_sf[sa1_sf$SA1_MAIN16 == i, ]
  if(nrow(sa1_shape_frame) < 1){  next()  }
  npts_q4 <- round(ihad_sa1_gcc$approx_number_q4_households[ihad_sa1_gcc$sa1_main_2016 == i]/perNhh)
  if(is.na(npts_q4)){next()} 
  if(npts_q4 == 0){next()}
  pts_q4 <- data.frame(spsample(x = sa1_shape_frame,
                             n = npts_q4,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q4
}
pointFrameq4 <- bind_rows(pointCollector, .id = "sa1_main") %>% 
  mutate(IHAD_quartile = as.factor(4))
pointCollector <- list() # cleaning up after oneself


pointFrameSA1gcc <- rbind(pointFrameq1, pointFrameq2, pointFrameq3, pointFrameq4)

dim(pointFrameSA1gcc) # just a check
approx_total_households <- ihad_sa1_gcc$dwellings_total %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA1gcc)[1]/approx_total_households
percent_covered_by_dots


randomly <- function(x) sample(xtfrm(x))

pointFrameSA1gcc <- pointFrameSA1gcc %>% 
  arrange(randomly(sa1_main))

corresp_sa1_to_gcc <- tibble(sa1_main = sa12016$sa1_main_2016, 
                             gcc_name_2016 = sa12016$gcc_name_2016)

pointFrameSA1gcc <- pointFrameSA1gcc %>% 
  full_join(corresp_sa1_to_gcc, by = "sa1_main")


 # tidy up the coordinate pairs

pointFrameSA1gcc <- pointFrameSA1gcc %>%
  filter(gcc_name_2016 == paste0("Greater ", city)) %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE) 

write_csv(pointFrameSA1gcc, path = paste0("Output/SA1_simulated_dots_", city, ".csv"))


}

Having simulated (which took a WHILE), we now plot.

# and we plot!
for(city in citylist){
  
  
sf_sa2_gcc_trim <- ihad_agg %>% 
  select(sa2_name_2016, everything()) %>% 
 mutate(perc_Q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         perc_Q2 = ifelse(dwellings_total >= 100, round(percentage_q2, 1), NA),
         perc_Q3 = ifelse(dwellings_total >= 100, round(percentage_q3, 1), NA),
         perc_Q4 = ifelse(dwellings_total >= 100, round(percentage_q4, 1), NA),
         approx_total_households = round(dwellings_total),
         note_A = paste0("One dot represents roughly ", perNhh, " households."),
         note_B = paste0("These dots cover ", round(100*percent_covered_by_dots, 1), " per cent of the city's households.")) %>% 
  filter(gcc_name_2016 == paste0("Greater ", city)) %>% 
  st_as_sf()

# and let's add an SA1 map below too.

sf_sa1_gcc <- ihad_sa1 %>% 
  filter(gcc_name_2016 == paste0("Greater ", city)) %>% 
  st_as_sf()

# now we read in our simulated data and convert the household locations into longitude and latitude
  
path <-   paste0("Output/SA1_simulated_dots_", city, ".csv")
data <- read_csv(path) %>%
  mutate(IHAD_quartile = as.factor(IHAD_quartile))

points_to_plot_gcc <- data %>% 
   st_as_sf(coords = c('x', 'y'))

# and now we plot!

tmap_mode("view")

int_map_sa1_gcc <- tm_shape(sf_sa2_gcc_trim) + 
  tm_fill(col = "white",  # this time, I don't actually want to shade anything in, I want to dot colour to do it
          alpha = 0,
          popup.vars = c("sa2_name_2016", "sa3_name_2016", "perc_Q1", "perc_Q2", "perc_Q3", "perc_Q4", "approx_total_households", "note_A", "note_B")) +
  tm_borders(col = grattan_grey4, 
             lwd = 0.5) +
  
#  tm_shape(sf_sa1_gcc) + # this includes an SA1 map that can be turned on or off. But it also makes the file much larger and a bit slower. 
#  tm_borders(col = "black") +
  
  tm_shape(points_to_plot_gcc) +
  tm_dots(col = "IHAD_quartile",
          palette = grattan_pal(4, reverse = TRUE),
          border.alpha = 0,
          alpha = 0.8, 
          size = 0.0005, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa1_gcc

assign(x = paste0("int_map_all_quartiles_sa1_",city), value = int_map_sa1_gcc, envir = .GlobalEnv)
assign(x = paste0("pointFrameSA1", city), value = pointFrameSA1gcc, envir = globalenv())

 tmap_save(tm = int_map_sa1_gcc, filename = paste0("Interactive_SA1_map_of_disadvantage_and_advantage_in_", city, "_", perNhh, "_unfixed.html"))
 


}


int_map_all_quartiles_sa1_Sydney
int_map_all_quartiles_sa1_Melbourne
int_map_all_quartiles_sa1_Brisbane

Let’s simulate at the SA2 level to speed up the process and reduce the file size, acknowledging the loss of locational accuracy that will result.

# Set GCC to speed up the process

citylist <- c("Sydney", "Melbourne", "Brisbane")
for(city in citylist){

perNhh <- 50
pointCollector <- list()

# Create one national shapefile
sa2_sf <- SA2_2016
temp <- SA2_2016@data %>% as.data.table()
sa2_sf@data <- temp[ , SA2_MAIN16 := SA2_MAIN16 %>% as.character()]
sa2_sf <- sa2_sf[sa2_sf$AREASQKM16 > 10^-4, ] # remove those pesky SA1s with zero area (like migratory - offshore or no usual address)

sa2_sf_gcc <- sa2_sf[sa2_sf$GCC_NAME16 == paste0("Greater ", city), ]

ihad_sa2_gcc <- ihad_agg %>%
  filter(areasqkm_2016 > 10^-4,
         gcc_name_2016 == paste0("Greater ", city))

for(i in ihad_sa2_gcc$sa2_main_2016){
  sa2_shape_frame <- sa2_sf[sa2_sf$SA2_MAIN16 == i, ]
  if(nrow(sa2_shape_frame) < 1){  next()  }
  npts_q1 <- round(ihad_sa2_gcc$approx_number_q1_households[ihad_sa2_gcc$sa2_main_2016 == i]/perNhh)
  if(is.na(npts_q1)){next()} 
  if(npts_q1 == 0){next()}
  pts_q1 <- data.frame(spsample(x = sa2_shape_frame,
                             n = npts_q1,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q1
}
pointFrameq1 <- bind_rows(pointCollector, .id = "sa2_main") %>% 
  mutate(IHAD_quartile = as.factor(1))
if( "coords.x1" %in%  names(pointFrameq1)){    # we have a problem where SOMETIMES a coordinate is labelled "coords.x1" not "x", and in those situtions we need to convert them back to "x"
  pointFrameq1 <- pointFrameq1 %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE)
}
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa2_gcc$sa2_main_2016){
  sa2_shape_frame <- sa2_sf[sa2_sf$SA2_MAIN16 == i, ]
  if(nrow(sa2_shape_frame) < 1){  next()  }
  npts_q2 <- round(ihad_sa2_gcc$approx_number_q2_households[ihad_sa2_gcc$sa2_main_2016 == i]/perNhh)
  if(is.na(npts_q2)){next()} 
  if(npts_q2 == 0){next()}
  pts_q2 <- data.frame(spsample(x = sa2_shape_frame,
                             n = npts_q2,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q2
}
pointFrameq2 <- bind_rows(pointCollector, .id = "sa2_main")%>% 
  mutate(IHAD_quartile = as.factor(2))
if( "coords.x1" %in%  names(pointFrameq2)){
  pointFrameq1 <- pointFrameq1 %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE)
}
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa2_gcc$sa2_main_2016){
  sa2_shape_frame <- sa2_sf[sa2_sf$SA2_MAIN16 == i, ]
  if(nrow(sa2_shape_frame) < 1){  next()  }
  npts_q3 <- round(ihad_sa2_gcc$approx_number_q3_households[ihad_sa2_gcc$sa2_main_2016 == i]/perNhh)
  if(is.na(npts_q3)){next()} 
  if(npts_q3 == 0){next()}
  pts_q3 <- data.frame(spsample(x = sa2_shape_frame,
                             n = npts_q3,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q3
}
pointFrameq3 <- bind_rows(pointCollector, .id = "sa2_main") %>% 
  mutate(IHAD_quartile = as.factor(3))
if( "coords.x1" %in%  names(pointFrameq3)){
  pointFrameq1 <- pointFrameq1 %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE)
}
pointCollector <- list() # cleaning up after oneself

for(i in ihad_sa2_gcc$sa2_main_2016){
  sa2_shape_frame <- sa2_sf[sa2_sf$SA2_MAIN16 == i, ]
  if(nrow(sa2_shape_frame) < 1){  next()  }
  npts_q4 <- round(ihad_sa2_gcc$approx_number_q4_households[ihad_sa2_gcc$sa2_main_2016 == i]/perNhh)
  if(is.na(npts_q4)){next()} 
  if(npts_q4 == 0){next()}
  pts_q4 <- data.frame(spsample(x = sa2_shape_frame,
                             n = npts_q4,
                             type = "random",
                             iter = 10)@coords) 
  pointCollector[[i]] <- pts_q4
}
pointFrameq4 <- bind_rows(pointCollector, .id = "sa2_main") %>% 
  mutate(IHAD_quartile = as.factor(4))
if( "coords.x1" %in%  names(pointFrameq4)){
  pointFrameq1 <- pointFrameq1 %>% 
  mutate(x = ifelse(is.na(x), coords.x1, x),
         y = ifelse(is.na(y), coords.x2, y)) %>% 
  select(-c(coords.x1, coords.x2)) %>%
  filter(is.na(x) == FALSE, is.na(y) == FALSE)
}
pointCollector <- list() # cleaning up after oneself


pointFrameSA2gcc <- rbind(pointFrameq1, pointFrameq2, pointFrameq3, pointFrameq4)

dim(pointFrameSA2gcc) # just a check
approx_total_households <- ihad_sa2_gcc$dwellings_total %>% sum(na.rm = TRUE) %>% divide_by(perNhh)
percent_covered_by_dots <- dim(pointFrameSA2gcc)[1]/approx_total_households
percent_covered_by_dots


randomly <- function(x) sample(xtfrm(x))

pointFrameSA2gcc <- pointFrameSA2gcc %>% 
  arrange(randomly(sa2_main))

corresp_sa2_to_gcc <- tibble(sa2_main = sa22016$sa2_main_2016, 
                             gcc_name_2016 = sa22016$gcc_name_2016)

pointFrameSA2gcc <- pointFrameSA2gcc %>% 
  full_join(corresp_sa2_to_gcc, by = "sa2_main")



#```

#```{r Interactive map of all quartiles by GCC, results="show"}

# Let's get the files ready for tmap, and split them in GCCs

# We set the city in the last chunk.

sf_sa2_gcc_trim <- ihad_agg %>% 
  select(sa2_name_2016, everything()) %>% 
 mutate(perc_Q1 = ifelse(dwellings_total >= 100, round(percentage_q1, 1), NA),
         perc_Q2 = ifelse(dwellings_total >= 100, round(percentage_q2, 1), NA),
         perc_Q3 = ifelse(dwellings_total >= 100, round(percentage_q3, 1), NA),
         perc_Q4 = ifelse(dwellings_total >= 100, round(percentage_q4, 1), NA),
         approx_total_households = round(dwellings_total),
         note_A = paste0("One dot represents roughly ", perNhh, " households."),
         note_B = paste0("These dots cover ", round(100*percent_covered_by_dots, 1), " per cent of the city's households.")) %>% 
  filter(gcc_name_2016 == paste0("Greater ", city)) %>% 
  st_as_sf()


# now we convert the household locations into longitude and latitude

points_to_plot_gcc <- pointFrameSA2gcc %>%
  filter(is.na(x) == FALSE) %>% 
   st_as_sf(coords = c('x', 'y'))

# and we plot!

tmap_mode("view")

int_map_sa2_gcc <- tm_shape(sf_sa2_gcc_trim) + 
  tm_fill(col = "white",  # this time, I don't actually want to shade anything in, I want to dot colour to do it
          alpha = 0,
          popup.vars = c("sa2_name_2016", "sa3_name_2016", "perc_Q1", "perc_Q2", "perc_Q3", "perc_Q4", "approx_total_households", "note_A", "note_B")) +
  tm_borders(col = grattan_grey4, 
             lwd = 0.5) +
  
  
  tm_shape(points_to_plot_gcc) +
  tm_dots(col = "IHAD_quartile",
          palette = grattan_pal(4, reverse = TRUE),
          border.alpha = 0,
          alpha = 0.8, 
          size = 0.0005, #set to 0.1 or 0.01 if you turn on dot.size.fixed 
          shape = 16) +  #picks circles
  
  tm_view(dot.size.fixed = FALSE) # prevents dots from scaling as you zoom

int_map_sa2_gcc

assign(x = paste0("int_map_all_quartiles_sa2_",city), value = int_map_sa2_gcc, envir = globalenv())

 tmap_save(tm = int_map_sa2_gcc, filename = paste0("Interactive_SA2_map_of_disadvantage_and_advantage_in_", city, "_", perNhh, "_unfixed.html"))

}

int_map_all_quartiles_sa2_Sydney
int_map_all_quartiles_sa2_Melbourne
int_map_all_quartiles_sa2_Brisbane

Simulating at the SA2 level halved the file size, but maybe that’s also because we sampled at 100 homes per dot, not 50. Let’s try again sampling at 50 and see if there’s any difference.

…

Okay so we’re back to the old file sizes. Sampling at the SA2 level doesn’t help reduce filesize if we keep the same number of dots. As such, there’s no real benefit to sampling at the SA2 level, especially given the loss of locational accuracy.

Then let’s make a static map at the SA1 level.

citylist <- c("Sydney", "Melbourne", "Brisbane")
for(city in citylist){
 
 
map <- ihad_agg_sa3 %>%
  filter(gcc_name_2016 == paste0("Greater ", city))
  
map_all_quartiles <- ggplot(map)
map_all_quartiles <- map_all_quartiles + theme_void()
map_all_quartiles <- map_all_quartiles + geom_sf(aes(geometry = geometry),
                                                 fill = "white",
                                                 lwd = 0.1)

path <- paste0("Output/SA1_simulated_dots_", city, ".csv")

data <- read_csv(file = path) %>%
  mutate(IHAD_quartile = as.factor(IHAD_quartile))

map_all_quartiles <- map_all_quartiles + geom_point(data = data,
                                                    aes(x = x, y = y, colour = IHAD_quartile),
                                                    shape = ".",
                                                    alpha = 0.75)

map_all_quartiles <- map_all_quartiles + scale_colour_manual(values = grattan_pal(4, reverse = TRUE))


map_all_quartiles <- map_all_quartiles + theme(legend.position = "right", 
                                               legend.direction = "vertical",
                                               legend.text = element_text(size = 12),
                                               axis.title = element_blank(),
                                               plot.title = element_text(hjust = 0.5), 
                                               plot.subtitle = element_text(hjust = 0.5),
                                               plot.caption = element_text(hjust = 0.5)
                                               )

map_all_quartiles <- map_all_quartiles +  coord_sf(datum = NA, crs = "+init=epsg:4238")
map_all_quartiles <- map_all_quartiles +  labs(title = paste0("Advantage and disadvantage across Greater ", city), 
                                              subtitle = "Lighter colour indicates more advantage",
                                              fill = "IHAD quartile",
                                              caption = paste0("Each dot represents ", perNhh, " households.\nBased on Grattan Institute analysis of ABS data."))

assign(paste0("All_quartiles_", city), value = map_all_quartiles, envir = globalenv())

}

All_quartiles_Sydney

All_quartiles_Melbourne

All_quartiles_Brisbane